Semantic Textual Similarity: past present and future
نویسنده
چکیده
Similarity is at the core of scientific inquiry in general and is one of the basic functionalities in Natural Language Processing (NLP) in particular. To arrive at generalizations across different phenomena, we need to recognize patterns of similarity, or divergence, to make scientific claims. Semantic textual similarity plays a significant role in NLP research both directly and indirectly. For example, for document summarization, we need to compress redundant information which requires identifying where the text is similar; for question answering, we need to recognize the similarity between the questions and the answers; textual similarity is an important component of an entailment system; evaluating machine translation (MT) output relies on calculating the similarity between the system’s output and some reference gold translations; textual generation technology benefits from sentence similarity by generating different expressions. In this talk, I will address the problem of textual semantic similarity. We have run 2 major tasks of STS over the span of two years within the context of Semeval in 2012 and *SEM shared task in 2013. The task to date is one of the most successful to be carried out within our community by virtue of being quite popular. I will share with you the details of the task, some interesting insights into the scientific merits of this enterprise and lessons learned. Finally I will share some thoughts on the future.
منابع مشابه
ExB Themis: Extensive Feature Extraction from Word Alignments for Semantic Textual Similarity
We present ExB Themis – a word alignmentbased semantic textual similarity system developed for SemEval-2015 Task 2: Semantic Textual Similarity. It combines both string and semantic similarity measures as well as alignment features using Support Vector Regression. It occupies the first three places on Spanish data and additionally places second on English data. ExB Themis proved to be the best ...
متن کاملMeasuring Semantic Similarity for Bengali Tweets Using WordNet
Similarity between natural language texts, sentences in terms of meaning, known as textual entailment, is a generic problem in the area of computational linguistics. In the last few years researchers worked on various aspects of textual entailment problem, but mostly restricted to English language. Here in this paper we present a method for measuring the semantic similarity of Bengali tweets us...
متن کاملVerbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis
This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...
متن کاملLearning Shallow Semantic Rules for Textual Entailment
In this paper we present a novel technique for integrating lexical-semantic knowledge in systems for learning textual entailment recognition rules: the typed anchors. These describe the semantic relations between words across an entailment pair. We integrate our approach in the cross-pair similarity model. Experimental results show that our approach increases performance of cross-pair similarit...
متن کاملVerbs in Applied Linguistics Research Article Introductions: Semantic and syntactic analysis
This study aims to investigate the semantic and syntactic features of verbs used in the introduction section of Applied Linguistics research articles published in Iranian and international journals. A corpus of 20 research article introductions (10 from each journal) was used. The corpus was analysed for the syntactic features (tense, aspect and voice) and semantic meaning of verbs. The finding...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013